feat(compression): enable dictionary compression in pure Rust backend by polaz · Pull Request #229 · structured-world/coordinode-lsm-tree

polaz · 2026-04-07T16:48:36Z

Summary

Implements `compress_with_dict()` in `ZstdPureProvider` using `FrameCompressor` from `structured-zstd` v0.0.11 (feat: dictionary builder — FastCOVER algorithm and dictionary finalization structured-zstd#25)
Removes the C FFI `zstd` crate entirely; `structured-zstd` is now the sole backend under the `zstd` feature flag
`zstd-pure` becomes a deprecated alias (`zstd-pure = ["zstd"]`) — enabling it is equivalent to enabling `zstd`
Supports both finalized zstd dictionaries (magic bytes `37 A4 30 EC` + entropy tables) and raw content dictionaries
TLS caching: single-entry `FrameCompressor` / `FrameDecoder` per thread, keyed by 64-bit xxh3 fingerprint + level
`strip_dict_id`: rewritten in-place (`get_mut` + `copy_within` + `truncate`) — eliminates O(frame_len) allocation per compressed block
Bug fix: `decode_raw_content_bounded` with `capacity=0` and an empty frame no longer incorrectly returns `DecompressedSizeTooLarge`; uses `remaining.max(1)` in `UptoBytes` so the decoder advances past the empty Last_Block before capacity is checked
CI: `test-zstd-pure` → `test-zstd`, matrix extended to include MSRV 1.92.0

Technical Details

Format detection: `compress_with_dict` and `decompress_with_dict` check for the zstd dictionary magic prefix (bytes `37 A4 30 EC`, little-endian `0xEC30A437`):

Finalized dict → `Dictionary::decode_dict` (entropy tables + content)
Raw content dict → `Dictionary::from_raw_content` with ID = lower 32 bits of xxh3, clamped to ≥1 (id=0 is reserved in the zstd frame format)

Raw-content dict ID stripping: `compress_with_dict` strips the synthetic dictID from the frame header after compression. This matches the zstd standard convention where `dictID=0` (absent) means "raw-content dict, id unknown, accept any", preventing decompressors from requiring the specific synthetic id.

Decompression-bomb guard: The raw-content dict decompress path calls `decoder.content_size()` after `init()` and rejects frames whose declared size exceeds the caller's capacity limit before allocating the output buffer. Frames without the FCS field fall back to the post-decode check.

`ZstdDictionary::id()`: Returns the raw lower 32 bits of xxh3 (may theoretically be 0). Config validation paths compare `dict.id()` against on-disk `dict_id` — both sides derive the same value, so validation is unaffected. The `.max(1)` clamp is only applied inside the backend when embedding an id in a zstd frame header.

Blocker resolved: structured-world/structured-zstd#25 (FastCOVER + dictionary finalization) merged; `structured-zstd` bumped to v0.0.11 (encoding performance improvements: row-based match finder, HC positions rebase, streaming scratch buffer reuse, FSE decoder packing, and HC table improvements).

Known Limitations

TLS compressor/decoder is a single-entry memoizer; switching dictionaries/levels on the same thread causes a re-parse — tracked in perf(compression): expand TLS compressor/decoder cache to multi-entry keyed map #231
Blob-file dictionary compression remains unsupported — tracked in feat(vlog): dictionary compression for blob files #230
Decompression throughput ~2–3.5× slower than C reference (pre-existing)

Test Plan

`src/compression/zstd_pure.rs` — roundtrip, magic detection, all levels, empty input, raw content dict, capacity guard, in-place strip_dict_id, empty-frame-at-capacity-0 regression
`tests/zstd_dict_roundtrip.rs` — full Tree write/flush/read, range scan, reopen, missing/wrong dict errors, finalized dict, encryption, per-level policy
`tests/zstd_dict_roundtrip.rs` — compaction path: 3 L0 SSTs flushed → `major_compact` → 300 keys readable + range scan (exercises both `compress_with_dict` and `decompress_with_dict` on the compaction hot path)
`tests/zstd_dict_roundtrip.rs` — reopen with wrong dict fails at recovery (`ZstdDictMismatch` on first SST read)
`cargo nextest run --no-default-features --features zstd,lz4`: 1208 passed
`cargo nextest run --all-features`: 1256 passed
`cargo clippy --all-features -- -D warnings`: clean
`cargo clippy --no-default-features --features zstd,lz4 --all-targets -- -D warnings`: clean
CI job `test-zstd` (renamed from `test-zstd-pure`) runs on `[stable, "1.92.0"]`

Closes #218

Summary by CodeRabbit

New Features
- Added support for zstd dictionary compression and decompression (pure-Rust backend).
Documentation
- Clarifies zstd now uses a pure‑Rust backend (no C toolchain), documents performance tradeoffs, current limitations, and marks the old alias as deprecated.
Tests
- Substantially expanded unit and integration tests for dictionary behaviors and round‑trip/compaction scenarios.
Chores
- CI and benchmark workflow updates; dependency and build-script feature wiring adjusted.

- Implement `compress_with_dict()` in `ZstdPureProvider` using `FrameCompressor::set_dictionary()` from `structured-zstd` v0.0.7, which added dictionary encoder support in structured-world/structured-zstd#25 - Support both finalized zstd dictionaries (magic 0x37A430EC + entropy tables, parsed via `decode_dict`) and raw content dictionaries (bare byte sequences, parsed via `from_raw_content` with xxh3-derived ID), matching the transparent handling in the C FFI backend - Remove the config-time guard that rejected `ZstdDict` compression policies under the pure backend — the feature is now fully supported - Update `decompress_with_dict` to apply the same format-detection logic so raw content dicts round-trip correctly end-to-end Unit tests (src/compression/zstd_pure.rs): - compress_with_dict_roundtrip_pure_to_pure - compress_with_dict_produces_zstd_magic - compress_with_dict_roundtrip_all_levels (levels 1, 3, 9, 19) - compress_with_dict_empty_dict_returns_error - compress_with_dict_raw_content_dict_works - compress_with_dict_empty_plaintext_roundtrips Integration tests (tests/zstd_dict_roundtrip.rs, zstd_pure_dict mod): - pure_tree_write_flush_read_zstd_dict (full Tree write→flush→read) - pure_tree_range_scan_with_zstd_dict - pure_tree_reopen_with_dict_reads_back_correctly - pure_zstd_dict_missing_returns_error Closes #218

coderabbitai · 2026-04-07T16:48:52Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 08b341ed-927a-4b05-9a14-033d60327186

📥 Commits

Reviewing files that changed from the base of the PR and between 5f2e3a0 and eb34b7f.

📒 Files selected for processing (1)

src/compression/zstd_pure.rs

📝 Walkthrough

Walkthrough

Replaced the C-FFI zstd backend with a single pure‑Rust structured‑zstd backend; implemented dictionary compression/decompression (finalized vs raw-content) with TLS caching and compressor cache; simplified ZstdDictionary API; removed zstd_ffi; updated Cargo/features, build script, CI, README, benchmarks, and tests.

Changes

Cohort / File(s)	Summary
Feature & dependency wiring `Cargo.toml`, `build.rs`	`zstd` now maps to optional `structured-zstd`; `zstd-pure` aliased to `zstd`; removed separate optional `zstd` dependency entry; build.rs emits `zstd_any` only when `CARGO_FEATURE_ZSTD` is set.
Pure Rust backend (dictionary support) `src/compression/zstd_pure.rs`	Added DICT_MAGIC detection, `strip_dict_id`, bounded raw-content decoding, `do_decompress_with_dict`, full `compress_with_dict` implementation (FrameCompressor), TLS single-entry compressor cache keyed by (xxh3_64(dict_raw), level), raw vs finalized-dict paths, and extensive unit tests.
Compression module & types `src/compression/mod.rs`	Removed C-FFI backend selection; `ZstdBackend` now resolves to pure‑Rust provider under `zstd`; removed prepared decoder field and accessor from `ZstdDictionary`; adjusted docs and `id64` cfg visibility.
Removed C‑FFI backend `src/compression/zstd_ffi.rs`	Deleted the C-FFI zstd backend and its `ZstdFfiProvider` implementation.
Configuration validation `src/config/mod.rs`	`Config::validate_zstd_dictionary()` no longer rejects `ZstdDict` at compile-time under pure-Rust; now computes `dict_id` and validates presence/match for referenced dictionaries.
CI / Coverage / Benchmarks `.github/workflows/coordinode-ci.yml`, `.github/workflows/benchmark.yml`, `benches/zstd_dict.rs`	Added `test-zstd` job and zstd-specific coverage run; updated benchmark checkout to `actions/checkout@v6`; renamed benchmark label to `tls_hit` and adjusted comments.
Docs & README `README.md`	Updated feature-flag docs: `zstd` described as pure‑Rust `structured-zstd` backend (no C deps), performance note added; `zstd-pure` deprecated as alias.
Tests / Integration `tests/zstd_dict_roundtrip.rs`	Added integration test `zstd_dict_survives_major_compaction` verifying Zstd-dict compression survives major compaction and validates reads and range scans.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant ZstdPure as ZstdPureProvider
    participant TLSCache
    participant Compressor as FrameCompressor
    participant Decoder as FrameDecoder

    rect rgba(200,200,255,0.5)
    Caller->>ZstdPure: compress_with_dict(input, dict_raw, level)
    ZstdPure->>TLSCache: lookup (xxh3_64(dict_raw), level)
    TLSCache-->>ZstdPure: compressor (create if missing)
    ZstdPure->>Compressor: compress(input, dict)
    Compressor-->>ZstdPure: compressed_frame
    ZstdPure->>ZstdPure: if raw-content dict → strip_dict_id(frame)
    ZstdPure-->>Caller: compressed_frame
    end

    rect rgba(200,255,200,0.5)
    Caller->>ZstdPure: decompress_with_dict(frame, dict_raw, capacity)
    ZstdPure->>ZstdPure: detect dict format (DICT_MAGIC / raw)
    alt raw-content path
        ZstdPure->>Decoder: init with force_dict + bounded decode
        Decoder-->>ZstdPure: decompressed_bytes / size_error
    else finalized-dict path
        ZstdPure->>Decoder: decode_all_to_vec(finalized)
        Decoder-->>ZstdPure: decompressed_bytes
    end
    ZstdPure-->>Caller: decompressed_bytes / error
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

#218: feat(compression): enable dictionary compression in pure Rust backend — This PR implements pure‑Rust dictionary compression/decompression requested by that issue.
#25: feat: dictionary builder — FastCOVER algorithm and dictionary finalization — The PR consumes finalized-dictionary format expectations (finalized dicts used by the compressor).
#232: (prepared-decoder caching proposal) — The PR replaces the prepared-field with TLS caching keyed by a 64-bit fingerprint, directly addressing the related caching/design surface.

Possibly related PRs

perf(compression): use numeric zstd levels in pure Rust backend #226 — Overlapping changes in the pure‑Rust zstd backend and compressor-level wiring.
feat(compression): CompressionProvider trait + pure Rust zstd backend #176 — Prior work touching compression backend selection and zstd modules.
perf(compression): cache pre-compiled Dictionary across block decompress calls #227 — Related edits to ZstdDictionary shape and TLS cache semantics.

Poem

🐇
I hopped from FFI hollows to structured‑zstd bright,
Pure Rust frames now hum through day and night,
Dicts compress and decompress with care,
TLS caches warm, IDs stripped in the air,
Tests clap their paws — round trips pass just right.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(compression): enable dictionary compression in pure Rust backend' accurately describes the main change—implementing compress_with_dict() in ZstdPureProvider using structured-zstd.
Linked Issues check	✅ Passed	The PR fully addresses all coding requirements from linked issues: `#25` (FastCOVER/dictionary finalization in upstream structured-zstd, now resolved) and `#218` (implement compress_with_dict in pure backend with round-trip compatibility and integration tests).
Out of Scope Changes check	✅ Passed	All changes are directly related to enabling dictionary compression in the pure Rust backend: structured-zstd bump, C-FFI backend removal, ZstdPureProvider implementation updates, test additions, and CI/build configuration adjustments are all in scope.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/#218-featcompression-enable-dictionary-compression-in-p

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-04-07T16:51:26Z

Codecov Report

❌ Patch coverage is 97.39583% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/compression/zstd_pure.rs	97.39%	5 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compression/mod.rs`:
- Around line 38-43: The documentation for the compression function is too
strict: update the doc comment that currently says "must be a finalized zstd
dictionary" to state that dict_raw accepts both finalized zstd dictionaries
(magic 0x37A430EC header, entropy tables, content) and bare content dictionaries
as produced/returned by ZstdDictionary::new() / ZstdDictionary::raw(); mention
that both the C FFI backend and the pure Rust backend accept both formats so
callers know either form is valid.

In `@src/compression/zstd_pure.rs`:
- Around line 111-118: The frame encoder currently clamps the xxh3 low-32-bit
hash to 1 locally (let id = { xxh3_64(dict_raw) as u32; h.max(1) }) causing
mismatch with ZstdDictionary::id() and CompressionType::ZstdDict; instead
compute a single normalized dict id from the raw hash in the shared
dictionary-ID path (e.g., produce raw_id = (xxh3_64(dict_raw) as u32) and
normalized_id = raw_id.max(1)) and use normalized_id everywhere metadata,
ZstdDictionary::id(), CompressionType::ZstdDict, and frame encoding read/write
IDs; remove the local h.max(1) clamping in the encoder and apply the same change
for the other occurrence around lines 179-183 so both metadata and frame headers
observe the identical normalized id.

In `@tests/zstd_dict_roundtrip.rs`:
- Around line 270-271: The zstd_pure_dict module is gated with #[cfg(all(feature
= "zstd-pure", not(feature = "zstd")))] and will be skipped by cargo test
--all-features, so add a dedicated CI job that runs tests with the pure backend
enabled (e.g., run cargo test --no-default-features --features zstd-pure or
cargo test --features zstd-pure while ensuring the "zstd" feature is not
enabled) so the zstd_pure_dict module is exercised; update your CI workflow to
include this new job (name it clearly, e.g., "test: zstd-pure") and ensure it
uses the same test matrix/platforms as the other test jobs.
- Around line 278-291: The current make_test_dictionary() only wraps raw sample
bytes with ZstdDictionary::new(), so tests never exercise the
finalized-dictionary path; change or add a test fixture that actually
trains/finalizes a dictionary (use the library's training/finalize API instead
of ZstdDictionary::new()), then use that finalized dictionary in an end-to-end
encode/decode test to exercise Dictionary::decode_dict and
FrameCompressor::set_dictionary_from_bytes; locate make_test_dictionary and
replace or add a make_finalized_test_dictionary that invokes the training
function (or calls the dictionary finalization method) on the same samples and
return the finalized ZstdDictionary for the new test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9f444fc6-12f8-4148-ae09-8969ce781690

📥 Commits

Reviewing files that changed from the base of the PR and between ab61d33 and 746a513.

📒 Files selected for processing (5)

README.md
src/compression/mod.rs
src/compression/zstd_pure.rs
src/config/mod.rs
tests/zstd_dict_roundtrip.rs

💤 Files with no reviewable changes (2)

README.md
src/config/mod.rs

Copilot

Pull request overview

This PR enables zstd dictionary compression for the zstd-pure backend (structured-zstd), removing the previous config-time rejection and adding unit + integration tests to exercise dictionary-compressed Tree roundtrips under the pure backend.

Changes:

Implement compress_with_dict() for ZstdPureProvider, including finalized-vs-raw dictionary format detection.
Remove the Config::open() guard that previously rejected CompressionType::ZstdDict under zstd-pure.
Add new unit tests (src/compression/zstd_pure.rs) and integration tests (tests/zstd_dict_roundtrip.rs) for pure-backend dictionary compression.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/compression/zstd_pure.rs`	Implements pure-backend dictionary compression and expands unit test coverage.
`src/config/mod.rs`	Removes the pure-backend dictionary-compression rejection during config validation.
`src/compression/mod.rs`	Updates trait-level docs to reflect dictionary support in the pure backend.
`tests/zstd_dict_roundtrip.rs`	Adds integration tests for dict compression via the pure backend through the Tree API.
`README.md`	Updates documentation by removing the old “dict compression unsupported” limitation for `zstd-pure`.

…-zero - Move .max(1) into ZstdDictionary::id() so config validation, stored metadata, and zstd frame headers all observe the same dict id (id=0 is invalid in the zstd frame format; edge case when xxh3 lower 32 bits are zero) - Remove redundant .max(1) from decompress_with_dict raw-content path - Update compress_with_dict trait doc to reflect that both finalized and raw-content dictionaries are accepted by both backends - Add dedicated test-zstd-pure CI job to exercise the pure backend independently (--all-features enables C zstd and skips the not(feature = "zstd") gate) - Add pure_finalized_dict_roundtrip integration test to cover the Dictionary::decode_dict path in the pure Rust backend - Correct make_test_dictionary() doc: builds a raw-content dict, not a finalized-dictionary fixture

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/coordinode-ci.yml:
- Around line 62-80: Add a new step in the existing test-zstd-pure job to run
clippy under the pure-only feature combo: insert a step before the "Run tests"
step that runs `cargo clippy --no-default-features --features zstd-pure,lz4
--all-targets -- -D warnings`; this ensures code behind #[cfg(all(feature =
"zstd-pure", not(feature = "zstd")))] (e.g. src/compression/zstd_pure.rs and
tests/zstd_dict_roundtrip.rs) is linted and won't introduce warnings.

In `@src/compression/mod.rs`:
- Around line 158-176: The current normalization in ZstdDictionary::id() forces
id=0 to 1 and breaks read compatibility: revert or limit that normalization so
existing stored metadata with dict_id == 0 can still be recognized;
specifically, keep ZstdDictionary::id64() unchanged, change ZstdDictionary::id()
to return the raw lower 32 bits (including 0) for read/validation paths and
perform the clamp-to-1 only in write/encoding code paths that emit new frame
headers (i.e., where CompressionType::ZstdDict is serialized), or alternatively
make validation logic that compares dictionaries accept both raw 0 and
normalized 1 as equivalent for the same dictionary; update the code paths that
currently call ZstdDictionary::id() for config validation and block/table
open-time checks to use the raw value or the tolerant comparison, and add a
regression test that writes metadata with dict_id == 0 and verifies it can be
read back without triggering ZstdDictMismatch.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d7702fe5-b3ec-47af-9ebb-251245a313de

📥 Commits

Reviewing files that changed from the base of the PR and between 746a513 and aacc1f1.

📒 Files selected for processing (4)

.github/workflows/coordinode-ci.yml
src/compression/mod.rs
src/compression/zstd_pure.rs
tests/zstd_dict_roundtrip.rs

- Add TLS FrameCompressor cache to compress_with_dict in pure Rust backend, matching the existing FrameDecoder TLS cache in decompress_with_dict - Cache key: (xxh3_64(dict_raw), level); dictionary is parsed at most once per thread per distinct (dict, level) pair - Use owned Cursor<Vec<u8>> source and Vec<u8> drain for 'static TLS compat - Revert ZstdDictionary::id() to return raw lower-32-bits of xxh3 (no clamping); backends that embed dict_id in the zstd frame header clamp to .max(1) themselves to preserve backward compat with on-disk dict_id=0 (theoretical) - Update id() doc to clarify that clamping is the backend's responsibility - Restore dict.id().max(1) in decompress_with_dict raw-content path - Add cargo clippy step for pure backend in test-zstd-pure CI job - Update benchmark.yml actions/checkout from v4 to v6

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/compression/mod.rs:178

PR description says ZstdDictionary::id() is normalized to always return ≥1, but the current implementation returns the raw lower 32 bits of xxh3 and the docs explicitly note it may be 0. Since id=0 is reserved/invalid in zstd frame headers, leaving id() able to return 0 means on-disk CompressionType::ZstdDict{dict_id} can be 0 while the pure backend clamps the frame header dict ID to ≥1 for raw-content dictionaries. Please either implement the promised normalization (and adjust docs/tests accordingly) or update the PR description to match the intended semantics.

    /// Returns a 32-bit dictionary fingerprint (lower 32 bits of xxh3).
    ///
    /// Intended for config validation (matching a `CompressionType::ZstdDict`
    /// `dict_id` against the supplied `ZstdDictionary`) and external interop.
    ///
    /// The value is the raw lower 32 bits of xxh3 and may theoretically be `0`
    /// (probability ≈ 1/2³²). Backends that embed a dict ID in the zstd frame
    /// header (where id=0 is reserved) are responsible for clamping to at
    /// least 1 themselves. Config validation is unaffected: both sides derive
    /// the ID from the same bytes and therefore agree even in the zero case.
    ///
    /// For internal cache keying use [`id64`](ZstdDictionary::id64) to avoid
    /// hash collisions.
    #[must_use]
    #[expect(
        clippy::cast_possible_truncation,
        reason = "intentional: public API returns 32-bit fingerprint"
    )]
    pub fn id(&self) -> u32 {
        self.id as u32
    }

- Fix module-level doc: remove reference to set_dictionary_from_bytes (implementation uses set_dictionary with a parsed Dictionary) - Reuse the exhausted Cursor<Vec<u8>> source buffer across compress() calls via take_source().map_or_else(...) to avoid a per-block O(data.len()) allocation after the first call on a given thread; drain ownership is transferred to the caller so its capacity cannot be recovered without an extra copy

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compression/zstd_pure.rs`:
- Line 130: Extract the duplicated DICT_MAGIC array into a single module-level
constant and use it wherever the inline definition appears: add a top-level
const DICT_MAGIC: [u8; 4] = [0x37, 0xA4, 0x30, 0xEC] (with doc comment) near the
file imports, then remove the inline definitions in the functions (e.g., the
current local DICT_MAGIC in decompress_with_dict) and replace their uses with
the new module-level DICT_MAGIC constant so the magic value is defined once.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9739d26c-0a93-4be1-baaf-f423a58fe9c8

📥 Commits

Reviewing files that changed from the base of the PR and between 7650e6b and 1aec535.

📒 Files selected for processing (1)

src/compression/zstd_pure.rs

polaz · 2026-04-07T19:39:57Z

@coderabbitai full review

coderabbitai · 2026-04-07T19:40:04Z

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compression/zstd_pure.rs`:
- Around line 146-159: The test suite is missing a cross-backend regression test
for raw-content dictionaries: add a test that creates a raw-content dictionary
(use Dictionary::from_raw_content with the synthetic id logic based on dict_key
and DICT_MAGIC/Dictionary::decode_dict semantics), compresses data using the
pure backend, then attempts to decompress the resulting frame using the FFI
backend to assert successful round-trip; ensure the test exercises the path
where dict_raw does not start with DICT_MAGIC so the synthetic id branch (the id
computed from dict_key) is used, and verify decompression succeeds despite the
differing dictID treatment between backends.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 027cfd5b-791d-4f90-a206-b3c0f3d2f603

📥 Commits

Reviewing files that changed from the base of the PR and between ab61d33 and 59bd591.

📒 Files selected for processing (7)

.github/workflows/benchmark.yml
.github/workflows/coordinode-ci.yml
README.md
src/compression/mod.rs
src/compression/zstd_pure.rs
src/config/mod.rs
tests/zstd_dict_roundtrip.rs

💤 Files with no reviewable changes (2)

README.md
src/config/mod.rs

The pure Rust backend (structured-zstd) rejects dictID=0 for Dictionary::from_raw_content, so it internally assigns a synthetic non-zero ID derived from xxh3_64(dict_raw). The C zstd library always records dictID=0 (absent) for raw-content dicts, causing two failures: pure → FFI: pure embeds synthetic dictID in frame header; libzstd ZSTD_DDict has id=0; decompressor checks dctx->dictID(0) != fParams.dictID(synthetic) → "Dictionary mismatch". FFI → pure: FFI records dictID=0 in frame; structured-zstd treats dictID=0 as "no dict required" and skips dict lookup, then fails decompression with NotEnoughBytesInDictionary. Fix the pure backend in both directions: compress_with_dict (raw-content): after compression, strip the Dict_ID field from the zstd frame header via strip_dict_id(). The synthetic ID is an internal detail — removing it aligns the output with the C convention (dictID absent/0), so the C FFI decompressor skips the ID check and accepts the frame. decompress_with_dict (raw-content): replace decode_all_to_vec (which calls init() internally and skips dict loading when dictID=0) with the manual init() → force_dict() → decode_blocks() → collect() flow. force_dict() loads the raw-content dict regardless of the frame's dictID field, handling all three cases: - Frame from C FFI backend (dictID absent): force_dict loads dict. - Frame from new pure backend (dictID stripped): same. - Frame from old pure backend (dictID=synthetic): force_dict reloads same dict (idempotent). Add cross-backend interoperability tests gated on #[cfg(all(test, feature = "zstd", feature = "zstd-pure"))]: pure_compress_ffi_decompress_raw_content_dict_roundtrip ffi_compress_pure_decompress_raw_content_dict_roundtrip mod zstd_pure is now compiled under #[cfg(feature = "zstd-pure")] (previously not(feature = "zstd")) so it is visible to the cross-backend test module when --all-features is used. Closes #218

sw-release-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'lsm-tree db_bench'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.15.

Benchmark suite	Current: `ffd9512`	Previous: `ab61d33`	Ratio
`mergerandom`	`261870.7610778383` ops/sec (normalized)	`380260.6261186645` ops/sec (normalized)	`1.45`

This comment was automatically generated by workflow using github-action-benchmark.

CC: @polaz

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compression/zstd_pure.rs`:
- Around line 151-174: The TLS variables (e.g., TLS_COMPRESSOR holding
Option<(u64, i32, CachedCompressor)>) implement a single-entry memoizer contrary
to the comment; replace the Option with a small keyed cache (HashMap or tiny
LRU) keyed by (dict_hash: u64, level: i32) so a thread can store multiple
CachedCompressor entries and avoid re-parsing when switching
dictionaries/levels; update the thread_local! RefCell<Option<...>> declarations
to RefCell<HashMap<(u64, i32), CachedCompressor>> or a bounded LRU map and apply
the same change to the analogous TLS cells referenced in the diff (the decoder
TLS instances), ensuring lookups insert/evict correctly and keys use
dict.id64()/level as described in the comments.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6b0df8e2-09ae-4eda-a5fd-d5d8f7ca0431

📥 Commits

Reviewing files that changed from the base of the PR and between 59bd591 and 9ce1517.

📒 Files selected for processing (2)

src/compression/mod.rs
src/compression/zstd_pure.rs

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compression/mod.rs`:
- Around line 110-111: The doc comment for the handle is incorrect: `new()`
stores the full 64-bit xxh3 fingerprint in `self.id` (see `id64()`), while
`id()` truncates to the lower 32 bits; update the comment to state that the
handle stores the full 64‑bit xxh3 hash of `raw` and that `id()` returns the
lower 32 bits on demand (referencing `ZstdDictionary::id` and `id64()` for
clarity).

In `@tests/zstd_dict_roundtrip.rs`:
- Around line 269-272: The test currently masks failures by calling
tree.level_table_count(0).unwrap_or(0); instead, call level_table_count(0) and
assert on its success and result explicitly: retrieve the Result via
.expect(...) or .unwrap() with a clear message (e.g., on the call to
level_table_count), bind it to a variable (e.g., count) and then assert_eq!(0,
count, "L0 must be empty after major_compact — compaction may not have run");
this ensures failures to fetch the L0 count fail the test rather than being
coerced to zero.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0b978752-95b1-42f5-bebc-a380b250c4d8

📥 Commits

Reviewing files that changed from the base of the PR and between ef55325 and dd0275f.

📒 Files selected for processing (4)

.github/workflows/coordinode-ci.yml
src/compression/mod.rs
src/compression/zstd_pure.rs
tests/zstd_dict_roundtrip.rs

- Fix test postcondition: assert_eq!(Some(0), level_table_count(0)) instead of unwrap_or(0) — properly distinguishes "L0 exists and is empty" from unexpected None (Copilot #54, CodeRabbit #56) - Fix ZstdDictionary::new() doc: handle stores full 64-bit xxh3 hash internally; id() returns lower 32 bits on demand (CodeRabbit #55)

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

… in decode_raw_content_bounded

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.

## 🤖 New release * `coordinode-lsm-tree`: 4.3.1 -> 4.4.0 <details><summary>Changelog</summary> <blockquote> ## [4.4.0](v4.3.1...v4.4.0) - 2026-04-09 ### Added - *(compression)* enable dictionary compression in pure Rust backend ([#229](#229)) ### Performance - *(compression)* cache pre-compiled Dictionary across block decompress calls ([#227](#227)) </blockquote> </details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: sw-release-bot[bot] <255865126+sw-release-bot[bot]@users.noreply.github.com>

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread src/compression/mod.rs Outdated

Comment thread src/compression/zstd_pure.rs Outdated

Comment thread tests/zstd_dict_roundtrip.rs Outdated

Comment thread tests/zstd_dict_roundtrip.rs Outdated

polaz requested a review from Copilot April 7, 2026 17:13

Copilot started reviewing on behalf of polaz April 7, 2026 17:14 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread src/compression/mod.rs Outdated

Comment thread tests/zstd_dict_roundtrip.rs Outdated

polaz requested a review from Copilot April 7, 2026 17:30

Copilot started reviewing on behalf of polaz April 7, 2026 17:31 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread src/compression/mod.rs Outdated

Comment thread src/compression/zstd_pure.rs Outdated

Comment thread .github/workflows/coordinode-ci.yml Outdated

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread .github/workflows/coordinode-ci.yml Outdated

Comment thread src/compression/mod.rs Outdated

polaz commented Apr 7, 2026

View reviewed changes

Comment thread .github/workflows/coordinode-ci.yml Outdated

polaz requested a review from Copilot April 7, 2026 18:48

Copilot started reviewing on behalf of polaz April 7, 2026 18:49 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread src/compression/zstd_pure.rs Outdated

Comment thread src/compression/zstd_pure.rs Outdated

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread src/compression/zstd_pure.rs Outdated

refactor(compression): extract DICT_MAGIC to module-level constant

59bd591

polaz requested a review from Copilot April 7, 2026 19:38

Copilot started reviewing on behalf of polaz April 7, 2026 19:39 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread src/compression/zstd_pure.rs

sw-release-bot Bot reviewed Apr 7, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread src/compression/zstd_pure.rs

polaz requested a review from Copilot April 7, 2026 21:25

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Comment thread tests/zstd_dict_roundtrip.rs Outdated

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread src/compression/mod.rs Outdated

Comment thread tests/zstd_dict_roundtrip.rs

polaz requested a review from Copilot April 9, 2026 07:55

Copilot started reviewing on behalf of polaz April 9, 2026 07:55 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Comment thread src/compression/zstd_pure.rs

Comment thread Cargo.toml

docs(compression): document UptoBytes one-block over-decode behaviour…

eb34b7f

… in decode_raw_content_bounded

polaz requested a review from Copilot April 9, 2026 10:06

Copilot started reviewing on behalf of polaz April 9, 2026 10:07 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

polaz merged commit abd0ee5 into main Apr 9, 2026
23 checks passed

polaz deleted the feat/#218-featcompression-enable-dictionary-compression-in-p branch April 9, 2026 10:16

sw-release-bot Bot mentioned this pull request Apr 9, 2026

chore: release v4.4.0 #228

Merged

This was referenced Apr 9, 2026

Roadmap: coordinode-lsm-tree performance & structured-zstd integration #215

Open

feat(vlog): dictionary compression for blob files #233

Merged

Conversation

polaz commented Apr 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Technical Details

Known Limitations

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

codecov Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

polaz commented Apr 7, 2026

Uh oh!

coderabbitai Bot commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sw-release-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

polaz commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading

codecov Bot commented Apr 7, 2026 •

edited

Loading

sw-release-bot Bot left a comment •

edited

Loading